perm filename PATREC[4,KMC]3 blob sn#085745 filedate 1974-02-05 generic text, type T, neo UTF8
00100	AN ALGORITHM WHICH RECOGNIZES NATURAL LANGUAGE
00200	          DIALOGUE EXPRESSIONS
00300	
00400	
00500	
00600		COLBY AND PARKISON
00700	
00800	OUTLINE
00900		INTRODUCTORY -Discussion of language as code, other approaches
01000		              sentence versus word dictionary using projection
01100	                      rules to yield an interpretation from word definitions.
01200		              experience with old Parry.
01300		PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400		Preprocessing- dict words only
01500		              translations
01600	
01700	                     (ROGER- WHAT HAPPENED TO FILLERS?)
01800		              contractions
01900			      expansions
02000			      synonyms
02100			      negation
02200		Segmenting - prepositions, wh-words, meta-verbs
02300			    give list
02400		Matching - simple and compound patterns
02500			  association with semantic functions
02600			  first coarsening - drop fillers- give list
02700			  second coarsening - drop one word at a time
02800			  dangers of insertion and restoration
02900			Recycle condition- sometimes a pattern containing pronouns
03000			is matched, like "DO YOU AVOID THEM".  If THEM could be
03100			a number of different things and Parry's answer depends on
03200			which one it is, then the current value of the anaphora,
03300			THEM, is substituted for THEM and the resulting pattern
03400			is looked up.  Hopefully, this will produce a match to a
03500			more specific pattern, like "DO YOU AVOID MAFIA".
03600			  default condition - pass surface to memory
03700			                      change topic or level
03800		Advantages - real-time performance, pragmatic adequacy and
03900			           effectiveness, performance measures.
04000			    "learning" by adding patterns
04100			    PARRY1 ignored word order- penalty too great
04200			    PARRY1 too sequential taking first pattern it found
04300			       rather than looking at whole input and then deciding.
04400			   PARRY1 had patterns strung out throughout procedures
04500			       and thus cumbersome for programmer to see what patterns were.
04600		Limitations - typical failures, possible remedies
04700	                        NO NOUNS, ETC.- NO MORE GENERAL RULES LIKE NOUN PHRASE, 
04800	                        AMBIGUITIES SLIDE THROUGH WHEREAS PARSER OWULD CATCH THEM
04900		Summary
05000	
05100	INTRODUCTION
05200	
05300		To recognize is to identify   something as an instance of the
05400	"same  again".   This  familiarity  is  possible because of recurrent
05500	characteristics of the world which repeat themsleves  over  and  over
05600	again.   We  shall  describe  an algorithm which recognizes recurrent
05700	characteristics of natural language dialogue  expressions  through  a
05800	multi-stage  sequence  of  functions  which  progressively transforms
05900	these input expressions into a pattern which eventually best  matches
06000	a  more abstract stored pattern. The name of the stored pattern has a
06100	pointer to the name of a response function which decides what  to  do
06200	once  the input has been characterized.         Here we shall discuss
06300	only the recognizing functions,  except  for  one  response  function
06400	(anaphoric     substitution)    which    interactively    aids    the
06500	characterization process.  How the response functions operate will be
06600	described in a future communication (Faught, Colby, Parkison).
06700		In  constructing  and  testing  a  simulation   of   paranoid
06800	processes,  we  were  faced  with the problem of reproducing paranoid
06900	linguistic behavior in a  diagnostic  psychiatric  interview.     The
07000	diagnosis   of  paranoid  states,  reactions  or  modes  is  made  by
07100	clinicians who judge a degree of  correspondence  between  what  they
07200	observe  linguistically in an interview and their conceptual model of
07300	paranoid behavior. There exists a high degree of agreement about this
07400	conceptual  model which relies mainly on what an interviewee says and
07500	how he says it.
07600		Natural language is a life-expressing  code  people  use  for
07700	communication  with  themselves  and others.  In a real-life dialogue
07800	such as a psychiatric interview,  the  participants  have  interests,
07900	intentions,  and  expectations which are revealed in their linguistic
08000	expressions.    To produce effects on an interviewer which  he  would
08100	judge  similar  to  the  effects  produced by a paranoid patient , an
08200	interactive  simulation  of  a  paranoid  patient  must  be  able  to
08300	demonstrate  typical  paranoid  interview  behavior.  To  achieve the
08400	desired effects, the paranoid model must have  the  ability  to  deal
08500	with the linguistic behavior of the interviewer.
08600		There are a number of approaches one might  consider  for  an
08700	ideal  handling  of dialogue expressions.       One approach would be
08800	to construct a dictionary of all  expressions  which  could  possibly
08900	arise  in an interview.  Associated with each expression would be its
09000	interpretation, depending on dialogue context,  which  in  turn  must
09100	somehow  be  defined. For obvious reasons, no one takes this approach
09200	seriously.     Instead  of  an  expression  dictionary,   one   might
09300	construct a word dictionary and then use projection rules to yield an
09400	interpretation of a sentence from the dictionary definitions.   This,
09500	has been the approach adopted by conventional linguistic parsers.
09600	Such  a method performs adequately as long as the dictionary involves
09700	only a few hundred words, each word having only one  or  two  senses,
09800	and the dialogue is limited to a mini-world  of  a  few  objects  and
09900	relations.    But the problems which arise in a psychiatric interview
10000	conducted in unrestricted English are too great for this method to be
10100	useful for  real-time  dialogues in which immediacy of response is an
10200	important requirement of efficiency.
10300		There is little consensus knowledge about how humans  process
10400	natural  language.  They  can  be  shown to possess some knowledge of
10500	grammar rules but this does not entail that they  use  a  grammar  in
10600	interpreting   and  producing  language.  Irregular  verb-tenses  and
10700	noun-plurals do not follow rules; yet people use thousands  of  them.
10800	One   school   of  linguistics  believes  that  people  possess  full
10900	transformational grammars for processing language.  In our view  this
11000	position  seems  dubious.   Originally transformational grammars were
11100	not  designed  to  "understand"  a  large  subset  of  English;  they
11200	represented  a  set  of  axioms  for  deciding  whether  a  string is
11300	"grammatical". Efforts to use them for other purposes have  not  been
11400	fruitful.
11500		An analysis of what one's problem actually  is  should  guide
11600	the  selection  or invention of methods appropriate to that problem's
11700	solution.  Our problem was not to develop a  consistent  and  general
11800	theory  of  language  nor  to  assert empirically testable hypotheses
11900	about how people process language.   Our  problem  was  to  write  an
12000	algorithm  which recognizes what is being said in a dialogue and what
12100	is being said about it in order to make a response such that a sample
12200	of I-O pairs from the paranoid model is judged similar to a sample of
12300	I-O  pairs  from  paranoid  patients.     From  the  perspective   of
12400	artificial  intelligence  as  an  attempt to get computers to perform
12500	human intellectual tasks, methods had to be devised for the  task  of
12600	participating  in a human dialogue in a paranoid-patient-like way. We
12700	are not making an existence claim that our  strategy  represents  the
12800	way  people  process  language.   We sought efficacious methods which
12900	could operate efficiently in real time. We would not  deny  that  our
13000	methods  for  this  task are possibly analogous to the methods humans
13100	use. And since our methods provide a general way of mapping from many
13200	"surface"  expressions  to a single stored pattern, these methods are
13300	useful for any type of dialogue algorithm.
13400		To perform  the  task  of  managing  communicative  uses  and
13500	effects  of  natural  language, we adopted a strategy of transforming
13600	the input until a pattern is achieved  which  matches  completely  or
13700	partially  a  more abstract stored pattern.  This strategy has proved
13800	adequate for our purposes a satisfactory percentage of the time.  (No
13900	one  expects an algorithm to be successful 100% of the time since not
14000	even humans, the best natural language systems around,  achieve  this
14100	level of performance).  The power of this method for natural language
14200	dialogues lies in its ability to  ignore  unrecognizable  expressions
14300	and  irrelevant  details.    A conventional parser doing word-by-word
14400	analysis fails when it cannot find one or more of the input words  in
14500	its dictionary. It is too fragile for a dialogue in that it must know
14600	every word; it cannot guess.
14700		In early versions of the paranoid model, (PARRY1), many  (but
14800	not all) of the pattern recognition mechanisms were weak because they
14900	allowed the elements of the pattern to be order  independent.     For
15000	example, consider the following expressions:
15100		(1) WHERE DO YOU WORK?
15200		(2) WHAT SORT OF WORK DO YOU DO? 
15300		(3) WHAT IS YOUR OCCUPATION? 
15400		(4) WHAT DO YOU DO FOR A LIVING? 
15500		(5) WHERE ARE YOU EMPLOYED? 
15600	In PARRY1 a procedure would scan these  expressions  looking  for  an
15700	information-bearing contentive such as "work", "for a  living",  etc.
15800	If  it  found  such  a contentive along with a "you" or "your" in the
15900	expression, regardless  of  word  order,  it  would  respond  to  the
16000	expression  as  if it were a question about the nature of one's work.
16100	(There  is  some  doubt  this  even  qualifies  as  a  pattern  since
16200	interrelations  between  words are ignored and only their presence is
16300	considered).    An insensitivity to word order has the advantage that
16400	lexical  items  representing  different parts of speech can represent
16500	the same concept,e.g.  "work" as noun or as verb. But a price is paid
16600	for  this  resilience  and elasticity. We found from experience that,
16700	since English relies heavily on word order to convey the  meaning  of
16800	it  messages,  the  penalty  of misunderstanding (to be distinguished
16900	from ununderdstanding), was too great.  Hence in PARRY2 , as will  be
17000	described shortly, all the patterns require a specified word order.
17100		It  seems  agreed  that  for  high-complexity  problems it is
17200	useful to have constraints.       Diagnostic  psychiatric  interviews
17300	(and  especially those conducted over teletypes) have several natural
17400	constraints.  First, clinicians are trained to ask certain  questions
17500	in  certain  ways. These stereotypes can be treated as special cases.
17600	Second, only  a  few  hundred  standard  topics  are  brought  up  by
17700	interviewers   who  are  trained  to  use  everyday  expressions  and
17800	especially those used by the patient himself. When the  interview  is
17900	conducted over teletypes, expressions tend to be shortened since the
18000	interviewer  tries to increase the information transmission rate over
18100	the slow channel of a teletype.  (It is said that  short  expressions
18200	are  more  grammatical  but  think  about  the phrase "Now now, there
18300	there.") Finally,  teletyped  interviews  represent  written  speech.
18400	Speech  is  known  to be 50% redundant; hence many unrecognized words
18500	can be ignored without losing the meaning  of  the  message.  Written
18600	speech  is  loaded  with idioms, cliches, pat phrases, etc.     - all
18700	being easy prey for a pattern-recognition approach.  It is futile  to
18800	try  to  decode  an idiom by analyzing the meanings of its individual
18900	words. One knows what an idiom means or one does not.
19000		We shall now describe the pattern recognition functions of the 
19100	algorithm.
19200	
19300			PREPROCESSING
19400	
19500		Each  word  in  the  input expression is first looked up in a
19600	dictionary of (1300) words which remains in core in machine language.
19700	The  dictionary consists of a list of words and names of word-classes
19800	they can be translated into. The size of the dictionary is determined
19900	by  the patterns, i.e.    each dictionary word appears in one or more
20000	patterns.  (ROGER- SHOW PIECE  OF  DICT?)  Words  in  the  dictionary
20100	reflect PARRY2's main interests. If a word in the input is not in the
20200	dictionary, it is dropped from the pattern being formed. Thus if  the
20300	input were:
20400		WHAT IS YOUR CURRENT OCCUPATION?
20500	and the word "current" is not in the dictionary, the pattern at  this
20600	stage becomes:
20700		(WHAT IS YOUR OCCUPATION)   
20800	The  question-mark  is  thrown  away as redundant since questions are
20900	recognized by word order. (A statement followed by  a  question  mark
21000	(YOU  GAMBLE?)  is considered to be communicatively equivalent in its
21100	effects  to  that  statement  followed  by   a   period.)   Synonymic
21200	translations  of  words  are  made  so  that the pattern becomes, for
21300	example:
21400		(WHAT  BE  YOU  JOB)   
21500	Groups  of  words  are  translated  into a single word-class name, so
21600	that, for example, "for a living" becomes "job".
21700		Misspellings  are  also  handled   in   the   dictionary   by
21800	simply rewriting a recognized  misspelling into its correct form.Thus
21900	"yyou" becomes "you". The common misspellings were gathered from over
22000	4000   interviews   with   versions  of  the  paranoid  model.  Other
22100	misspellings do not appear  in  the  pattern  because  they  are  not
22200	represented in the dictionary.
22300		Certain  juxtaposed  words  are  contracted  into  a   single
22400	word,e.g.   "GET ALONG WITH" becomes "GETALONGWITH". This is done (1)
22500	to deal with groups of  words  which  are  represented  as  a  single
22600	element  in  the stored pattern, and (2) to prevent segmentation from
22700	occurring at the wrong places, such as at  a  preposition  inside  an
22800	idiom.    Besides  these contractions, certain expansions are made so
22900	that for example, "DON'T" becomes  "DO  NOT"  and  "I'D"  becomes  "I
23000	WOULD".
23100			SEGMENTING
23200	
23300		Another weakness in the crude pattern matching of PARRY1  was
23400	that  it  took  the  entire  input expression as its basic processing
23500	unit. Hence if only two words were recognized in an eight word input,
23600	the  risk  of  misunderstanding was great. We needed a way of dealing
23700	with units shorter than the entire input expression.
23800		Expert  telegraphists  stay  six  to  twelve  words  behind a
23900	received message before transcribing it. (Bryan  and  Harter,  1897).
24000	Translators  wait  until  they have heard 4-6 words before they begin
24100	translating. Aided by a heuristic from  machine-translation  work  by
24200	Wilks  ( ), we devised a way of bracketing the pattern constructed up
24300	to this point into shorter segments using the list of words in Fig.1.
24400	The   new  pattern  formed  is  termed  either  "simple",  having  no
24500	delimiters within it, or "compound", i.e.being made up of two or more
24600	simple patterns. A simple pattern might be:
24700		( WHAT BE YOU JOB )
24800	whereas a compound pattern would be:
24900		(( WHY BE YOU ) ( IN HOSPITAL ))
25000	Our experience with this method of segmentation shows  that  compound
25100	patterns from psychiatric dialogues rarely consist of more than three
25200	or four fragments.
25300	       After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
25400	to replace the commonly omitted "THAT", such that:
25500		( I THINK YOU BE AFRAID )
25600	becomes
25700		(( I THINK ) ( YOU BE AFRAID ))
25800	
25900			PREPARATION FOR MATCHING
26000	
26100		Conjunctions serve only as markers for the segmenter and they
26200	are dropped out after segmentation.
26300		Negations are  handled  by  extracting  the  "NOT"  from  the
26400	pattern and assigning a value to a global variable which indicates to
26500	the algorithm that the expression is negative in form. When a pattern
26600	is  finally matched, this variable is consulted. Some patterns have a
26700	pointer to a pattern of opposite meaning if  a  "NOT"  could  reverse
26800	their  meanings.     If this pointer is present and a "NOT" is found,
26900	then the pattern matched is replaced by its opposite, e.g.  (  I  not
27000	trust  you  )  is replaced by the pattern ( I mistrust you ). We have
27100	not yet observed the troublesome case of "he gave me not one but  two
27200	messages". (There is no need to scratch where it doesn't itch).
27300	
27400			MATCHING AND RECYCLING
27500	
27600		The algorithm now attempts to match  the  segmented  patterns
27700	with  stored  patterns  which  are  currently  about  2000 in number.
27800	First a complete and perfect match is sought.  When a match is found,
27900	the  stored  pattern  name  has  a  pointer to the name of a response
28000	function which decides what to do further.  If a match is not  found,
28100	further  transformations of the pattern are carried out and a "fuzzy"
28200	match is tried.
28300		For fuzzy matching at this stage, the elements in the pattern
28400	are dropped one at a time and a  match  attempted  each  time.   This
28500	allows ignoring familiar words in unfamiliar contexts.   For example,
28600	"well" is important in "Are you well?" but meaningless in  "Well  are
28700	you?".  
28800	 	Deleting one element at a time results  in,  for  example,the
28900	pattern:
29000			( what be you main problem )
29100	becoming successively:
29200			(a) ( be you main problem )
29300			(b) ( what you main problem )
29400			(c) ( what be main problem )
29500			(d) ( what be you problem )
29600			(e) ( what be you main )
29700	Since the stored pattern in this case matches (d), (e) would  not  be
29800	constructed.   We  found  it  unwise  to delete more than one element
29900	since our segmentation method usually yields  segments  containing  a
30000	small (1-4) number of words.
30100		Dropping  an  element  at  a  time  provides  a   probalility
30200	threshold  for  partial matching which is a function of the length of
30300	the segment. If a segment consists of five elements, four of the five
30400	must be present in a particular order (with the fifth element missing
30500	in any position) for a match to occur. If  a  segment  contains  four
30600	elements, three must match - and so forth.
30700	        The transformations  described above result in a  progressive
30800	
30900	coarsening of the patterns by deletion. Substitutions are  also  made
31000	in  certain  cases.  Some patterns contain pronouns which could stand
31100	for a number of different  things  of  importance  to   PARRY2.   The
31200	pattern:
31300		( DO YOU AVOID THEM )
31400	could  refer  to  the Mafia, or racetracks, or other patients.   When
31500	such a pattern is recognized, the pronoun is replaced by its  current
31600	anaphoric  value  as determined by the response functions, and a more
31700	specific pattern such as:
31800		( DO YOU AVOID MAFIA )
31900	is  looked  up.  In many cases, the meaning of a pattern containing a
32000	pronoun is clear without any substitution.  In the pattern:
32100		(( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
32200	the meaning of THEY is clarified by ( IN HOSPITAL ).
32300	
32400			COMPOUND-PATTERN MATCH
32500	
32600		When more than one simple pattern is detected in the input, a
32700	second matching is attempted.  The methods used are  similar  to  the
32800	first  matching except they occur at the segment level rather than at
32900	the single element level. Certain patterns, such as ( HELLO ) and ( I
33000	THINK  ),  are dropped because they are considered meaningless.  If a
33100	complete match is not found, then simple patterns are dropped, one at
33200	a time, from the complex pattern. This allows the input,
33300		(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
33400	to  match  the  stored  pattern,
33500		(( HOW  DO  YOU COME ) ( IN HOSPITAL )).
33600	
33700		If  no  match  can  be found at this point, the algorithm has
33800	arrived at a default condition and the appropriate response functions
33900	decide  what  to  do.  For example, in a default condition, the model
34000	may assume  control  of  the  interview,  asking  the  interviewer  a
34100	question, continuing with the topic under discussion or introducing a
34200	new topic.
34300	
34400		ADVANTAGES AND LIMITATIONS
34500	
34600		As   mentioned,   one   of   the   main   advantages   of   a
34700	characterization strategy is that it can ignore as irrelevant what it
34800	does  NOT  recognize.     There are several million words in English,
34900	each  possessing  one  to  one  hundred  senses.   To   construct   a
35000	machine-usable  word  dictionary  of  this  magnitude  is  out of the
35100	question at this time.    Recognition of natural language input  such
35200	as  described above, allows real-time interaction in a dialogue since
35300	it avoids becoming ensnarled  in  combinatorial  disambiguations  and
35400	long chains of inferencing which would slow a dialogue algorithm down
35500	to impracticality, if it could even function at all. The  price  paid
35600	for  pattern matching is that sometimes, but rarely, ambiguities slip
35700	through.
35800		A drawback to PARRY1 was that it reacted to the first pattern
35900	it  found  in the input rather than characterizing the input as fully
36000	as possible and then deciding what to do based on a number of  tests.
36100	Another   practical   difficulty  with  PARRY1  from  a  programmer's
36200	viewpoint, was that elements of  the  patterns  were  strung  out  in
36300	various  procedures  throughout  the  algorithm.    It  was  often  a
36400	considerable chore for the programmer to determine  whether  a  given
36500	pattern   was   present   and   precisely   where   it  was.  In  the
36600	above-described method, the patterns are all collected in one part of
36700	the data-base where they can easily be examined.
36800		Concentrating  all the patterns in the data base gives PARRY2
36900	a limited "learning" ability.  When  an  input  fails  to  match  any
37000	stored  pattern  or  matches  an  incorrect one, as judged by a human
37100	operator, a pattern matching the input can be put into the  data-base
37200	automatically.  If  the  new  pattern  has  the  same  meaning  as  a
37300	previously stored pattern, the human operator must provide  the  name
37400	of  the  appropriate  response  function.  If he doesn't remember the
37500	name, he may try to rephrase the input  in  a  form  recognizable  to
37600	PARRY2  and  it  will  name the response function associated with the
37700	rephrasing.  These mechanisms are not "learning" in the commonly used
37800	sense  but  they  do  allow  a  person to transfer his knowledge into
37900	PARRY2's data-base with very little redundant effort.
38000			PERFORMANCE
38100		We have a number of performance measures on  PARRY1  along  a
38200	number  of  dimensions including "linguistic non-comprehension". That
38300	is, judges estimated PARRY1's abilities along this dimension on a 0-9
38400	scale.   They  also  rated  human  patients and a "random" version of
38500	PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND  DISCUSS).   We  have
38600	collected  ratings of PARRY2 along this dimension to determine if the
38700	characterization  process  represents  an  improvement  over  PARRY1.
38800	(FRANK AND KEN EXPERIMENT).